pacman::p_load(jsonlite,dplyr,tidyr,stringr,lubridate,tidyverse,readtext,ggplot2,visNetwork,stringr,ggpubr, igraph, patchwork,igraph,ggraph,ggrepel)Take Home 3

1. Introduction
FishEye International monitors business records of commercial fishing operators in Oceanus to identify and prevent illegal fishing. Analysts work with company data, including ownership, shareholders, transactions, and products/services, to build the CatchNet Knowledge Graph.
Last year, SouthSeafood Express Corp was caught illegally fishing, leading to its closure. FishEye wants to understand the temporal patterns and infer how the fishing market reacted to this event, as some businesses may have tried to capture SouthSeafood’s market share, while others may have become more cautious about illegal activities.
FishEye aims to develop visualization tools for CatchNet to identify influential people in business networks, considering the varied and changing shareholder and ownership relationships.
2. Mini-Challenge 3: Temporal Analysis
2.1 Tasks and Questions:
Develop an approach using visual analytics to highlight temporal patterns and changes in corporate structures, focusing on identifying the most active individuals and businesses.
Utilize visualizations to display typical and atypical business transactions, such as mergers and acquisitions, and infer the underlying motivations behind changes in their activity levels.
Data Source: VAST Challenge 2024: Mini-Challenge 3
3. Data
The MC3 dataset is a comprehensive collection comprising 60,520 nodes (entities) and 75,817 edges (relationships or connections) organized into 4,782 distinct components. The nodes in this dataset represent various types of entities, including individuals (Person), chief executive officers (CEO), companies, and other organizational structures.
On the other hand, the edges capture different types of relationships or interactions between these nodes. Some examples of edge types include shareholdership (ownership of shares in a company), beneficial ownership (enjoying the benefits of owning a property or asset without being the legal owner), and potentially other forms of associations or transactions.
The dataset is structured such that the nodes key contains a list representing all the node entities, with each node carrying attributes or properties that describe its characteristics, such as ID, type, country, revenue, founding date, and potentially other relevant details.
Correspondingly, the links key holds a list that represents all the edges or connections between these nodes. Each edge entry typically includes properties like the edge type, start and end dates (if applicable), and identifiers for the source and target nodes involved in the relationship.
With this comprehensive dataset capturing both node entities and their interconnections through various types of edges, researchers and analysts can conduct in-depth analyses to uncover patterns, understand the dynamics of relationships, and gain insights into the complex network of entities and their interactions.
3.1 Data Preparations
- jsonlite: This package provides functionality for parsing and generating JSON data in R.
- dplyr: A part of the tidyverse, dplyr provides a consistent set of verbs for data manipulation, making it easier to transform and summarize data frames.
- tidyr: Another tidyverse package, tidyr helps in creating tidy data sets by providing functions for reshaping data frames.
- stringr: This package provides a cohesive set of functions for string manipulation and regular expressions in R.
- lubridate: lubridate is designed to make it easier to work with date-time data in R.
- tidyverse: The tidyverse is a collection of R packages designed for data science, including dplyr, ggplot2, tidyr, and others.
- readtext: readtext makes it easier to import and handle text data in R, particularly for text mining and analysis.
- ggplot2: Part of the tidyverse, ggplot2 is a powerful data visualization package for creating complex and publication-ready plots.
- visNetwork: visNetwork is a package for creating interactive network visualizations in R, using the vis.js library.
- ggpubr: ggpubr provides easy-to-use functions for creating publication-ready plots and combining multiple ggplot2 plots into a single figure.
- igraph: igraph is a collection of network analysis tools for creating and analyzing graphs and networks in R.
patchwork: patchwork is a package for composing multiple ggplot2 plots into a single figure, with easy layout control.
igraph: A package for creating and analysing graphs and networks.
ggraph: A package for creating graph-based data visualisations using the ‘ggplot2’ syntax.
ggrepel: A package for automatically adjusting text labels to avoid overlapping in ‘ggplot2’ visualisations.
Click to show code
Click to show code
mc3 <- fromJSON("data/mc3.json")To effectively perform temporal analysis on such data, it can be beneficial to separate the data set into two distinct data frames: one for the node entities and another for the relationships or edges between them.
Click to show code
# Load necessary libraries
library(dplyr)
library(tidyr)
library(stringr)
# Assuming mc3 is already loaded as a list containing nodes dataframe
mc3_nodes <- as_tibble(mc3$nodes) %>%
mutate(
id = as.character(id),
type = as.character(type),
country = as.character(country),
ProductServices = as.character(ProductServices),
revenue_omu = as.numeric(revenue),
head_of_org = as.character(HeadOfOrg),
TradeDescription = as.character(TradeDescription),
PointOfContact = as.character(PointOfContact),
id = gsub("^[0-9]+\\.\\s*", "", id), # Clean up IDs
revenue_omu = ifelse(is.na(revenue_omu), 0, revenue_omu)
) %>%
mutate(
type = ifelse(type == "Entity.Person", "Entity.Person.Person", type) # Rename Entity.Person
) %>%
rename(
last_edited_by = `_last_edited_by`,
last_edited_date = `_last_edited_date`,
date_added = `_date_added`,
raw_source = `_raw_source`,
algorithm = `_algorithm`
) %>%
select(everything())
# Display the first few rows of the updated dataframe
print(head(mc3_nodes))# A tibble: 6 × 17
type country ProductServices PointOfContact HeadOfOrg founding_date revenue
<chr> <chr> <chr> <chr> <chr> <chr> <dbl>
1 Entity… Uziland Unknown Rebecca Lewis Émilie-S… 1954-04-24T0… 5995.
2 Entity… Mawala… Furniture and … Michael Lopez Honoré L… 2009-06-12T0… 71767.
3 Entity… Uzifri… Food products Steven Robert… Jules La… 2029-12-15T0… 0
4 Entity… Islava… Unknown Anthony Wyatt Dr. Víct… 1972-02-16T0… 0
5 Entity… Oceanus Unknown Tracy Schmidt Jacques … 1954-04-06T0… 4747.
6 Entity… Imazam Fish, crustace… Corey Moore J… Thierry … 2031-09-30T0… 46567.
# ℹ 10 more variables: TradeDescription <chr>, last_edited_by <chr>,
# last_edited_date <chr>, date_added <chr>, raw_source <chr>,
# algorithm <chr>, id <chr>, dob <chr>, revenue_omu <dbl>, head_of_org <chr>
(mc3_nodes$type) %>% unique()[1] "Entity.Organization.Company"
[2] "Entity.Organization.LogisticsCompany"
[3] "Entity.Organization.FishingCompany"
[4] "Entity.Organization.FinancialCompany"
[5] "Entity.Organization.NewsCompany"
[6] "Entity.Organization.NGO"
[7] "Entity.Person.Person"
[8] "Entity.Person.CEO"
# Further processing to separate the 'type' column
mc3_nodes <- mc3_nodes %>%
separate(type, into = c("type_1", "type_2", "type_3"), sep = "\\.", fill = "right", extra = "drop")
# Display the first few rows of the updated dataframe
print(head(mc3_nodes))# A tibble: 6 × 19
type_1 type_2 type_3 country ProductServices PointOfContact HeadOfOrg
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Entity Organization Company Uziland Unknown Rebecca Lewis Émilie-S…
2 Entity Organization Company Mawalara Furniture and … Michael Lopez Honoré L…
3 Entity Organization Company Uzifrica Food products Steven Robert… Jules La…
4 Entity Organization Company Islavara… Unknown Anthony Wyatt Dr. Víct…
5 Entity Organization Company Oceanus Unknown Tracy Schmidt Jacques …
6 Entity Organization Company Imazam Fish, crustace… Corey Moore J… Thierry …
# ℹ 12 more variables: founding_date <chr>, revenue <dbl>,
# TradeDescription <chr>, last_edited_by <chr>, last_edited_date <chr>,
# date_added <chr>, raw_source <chr>, algorithm <chr>, id <chr>, dob <chr>,
# revenue_omu <dbl>, head_of_org <chr>
(mc3_nodes$type_3) %>% unique()[1] "Company" "LogisticsCompany" "FishingCompany" "FinancialCompany"
[5] "NewsCompany" "NGO" "Person" "CEO"
Click to show code
mc3_edges <- as_tibble(mc3$links) %>%
mutate(
type = as.character(type),
type_new = stringr::str_extract(type, "[^.]+$"),
source = as.character(source),
target = as.character(target),
last_edited_by = as.character(`_last_edited_by`),
last_edited_date = as.Date(`_last_edited_date`),
date_added = as.Date(`_date_added`),
raw_source = as.character(`_raw_source`),
algorithm = as.character(`_algorithm`)
) %>%
mutate(
start_date = as.Date(start_date),
end_date = if_else(is.na(end_date), as.Date(NA), as.Date(end_date)),
id = target
) %>%
# Remove `_last_edited_by`, `_last_edited_date`, `_date_added`, `_raw_source`, `_algorithm` if they exist
select(-c(`_last_edited_by`, `_last_edited_date`, `_date_added`, `_raw_source`, `_algorithm`)) %>%
rename(
last_edited_by = last_edited_by,
last_edited_date = last_edited_date,
date_added = date_added,
raw_source = raw_source,
algorithm = algorithm
) %>%
select(everything())Click to show code
mc3_edges <- mc3_edges %>%
mutate(
start_date = as.POSIXct(start_date),
last_edited_date = as.POSIXct(last_edited_date),
date_added = as.POSIXct(date_added),
end_date = if_else(is.na(end_date), as.POSIXct(NA), as.POSIXct(end_date)),
id = target
)
mc3_nodes <- mc3_nodes %>%
mutate(
last_edited_date = as.POSIXct(last_edited_date),
date_added = as.POSIXct(date_added),
founding_date = as.POSIXct(founding_date),
dob = as.POSIXct(dob)
)Click to show code
mc3_edges [duplicated(mc3_edges ),]# A tibble: 0 × 13
# ℹ 13 variables: start_date <dttm>, type <chr>, source <chr>, target <chr>,
# key <int>, end_date <dttm>, type_new <chr>, last_edited_by <chr>,
# last_edited_date <dttm>, date_added <dttm>, raw_source <chr>,
# algorithm <chr>, id <chr>
mc3_nodes [duplicated(mc3_nodes ),]# A tibble: 0 × 19
# ℹ 19 variables: type_1 <chr>, type_2 <chr>, type_3 <chr>, country <chr>,
# ProductServices <chr>, PointOfContact <chr>, HeadOfOrg <chr>,
# founding_date <dttm>, revenue <dbl>, TradeDescription <chr>,
# last_edited_by <chr>, last_edited_date <dttm>, date_added <dttm>,
# raw_source <chr>, algorithm <chr>, id <chr>, dob <dttm>, revenue_omu <dbl>,
# head_of_org <chr>
Click to show code
nodes <- mc3_nodes %>%
select(
type_2, type_3, id, dob, country,
head_of_org, revenue, last_edited_by,
last_edited_date, date_added, raw_source, algorithm
)
edges <- mc3_edges %>%
select(
type_new, id, start_date, end_date,
last_edited_by, last_edited_date, date_added,
raw_source, algorithm
)What are the Company Types?
# Count the unique values in type_2
type_3_unique_counts <- mc3_nodes %>%
group_by(type_3) %>%
summarise(count = n()) %>%
arrange(desc(count))
# Display the summary table
print(type_3_unique_counts)# A tibble: 8 × 2
type_3 count
<chr> <int>
1 Person 50356
2 Company 7927
3 CEO 1293
4 FishingCompany 600
5 LogisticsCompany 311
6 FinancialCompany 23
7 NGO 5
8 NewsCompany 5
duplicate_counts <- mc3_edges %>%
group_by(target, type_new) %>%
summarise(count = n(), .groups = 'drop') %>%
filter(count > 1)
# Display the result
print(duplicate_counts)# A tibble: 13,220 × 3
target type_new count
<chr> <chr> <int>
1 6. America Transit Plc Shareholdership 2
2 Abbott, Mcbride and Edwards BeneficialOwnership 2
3 Abbott, Mcbride and Edwards Shareholdership 6
4 Abbott, Mcbride and Edwards WorksFor 3
5 Abbott-Harrison BeneficialOwnership 4
6 Abbott-Harrison Shareholdership 2
7 Abbott-Ibarra Shareholdership 4
8 Abbott-Sullivan Shareholdership 7
9 Abbott-Sullivan WorksFor 3
10 Acevedo PLC BeneficialOwnership 2
# ℹ 13,210 more rows
duplicate_counts <- mc3_edges %>%
group_by(source, target, type_new) %>%
summarise(count = n(), .groups = 'drop') %>%
filter(count > 1)
# Display the result
print(duplicate_counts)# A tibble: 2 × 4
source target type_new count
<chr> <chr> <chr> <int>
1 Akira Hernandez PregolyaDredge Logistics Incorporated BeneficialOwnersh… 2
2 Isla Davis Parks Ltd BeneficialOwnersh… 2
Extracting the top 10 id to see the different company types in edges and nodes file.
Click to show code
top_10_ids <- edges %>%
group_by(id) %>%
summarize(count = n()) %>%
arrange(desc(count)) %>%
slice(1:10)Create a bar chart for nodes and edges.
Click to show code
barchart_type_counts_nodes <- mc3_nodes %>%
count(type_3, sort = TRUE)
barchart_type_counts_edges <- mc3_edges %>%
count(type_new, sort = TRUE)
b1 <- ggplot(barchart_type_counts_nodes, aes(x = reorder(type_3, -n), y = n)) +
geom_bar(stat = "identity", fill = "lightblue") +
labs(
title = "Distribution of Nodes Types",
x = "Nodes Type",
y = "Count"
) +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 0, hjust = 0.8, size = 7)
)
b2 <- ggplot(barchart_type_counts_edges, aes(x = reorder(type_new, -n), y = n)) +
geom_bar(stat = "identity", fill = "lightblue") +
labs(
title = "Distribution of Edges Types",
x = "Edge Type",
y = "Count"
) +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 0, hjust = 0.5, size = 7)
)combined_plot <- b1 / b2
print(combined_plot)
Nodes Type:
‘
Person,’ has count exceeding 50,000 and is the most common entities in the network. ‘Companies’ numbering around 5,000‘
CEO’ are present but much fewer, reflecting a small proportion of individuals with this specific role.Other organization types, such as ‘
FishingCompany,’ ‘LogisticsCompany,’ ‘FinancialCompany,’ ‘NewsCompany,’ and ‘NGO,’ are present in minimal numbers.‘
Shareholdership,’ has over 35,000 instances, indicating ownership stakes in companies as a primary form of interaction.‘
BeneficialOwnership’ is the second most common relationship, with around 20,000 instances. ‘Worksfor’ is slightly lesser than beneficialownership with around 15,000 instances.‘
FamilyRelationship’ connections are the least represented, suggesting that familial ties are not as prominent as business and ownership relations in the dataset.Companies and Relationship to Types
Network Graph
Click to show code
library(dplyr)
library(igraph)
library(visNetwork)
# Assuming mc3_edges is already loaded as a dataframe
# Get the top 20 companies based on the count of edges
top_companies <- mc3_edges %>%
count(id, sort = TRUE) %>%
top_n(20, wt = n)
# Filter edges to include only those involving the top 20 companies
filtered_edges <- mc3_edges %>%
filter(id %in% top_companies$id) %>%
select(id, type_new)
# Prepare edges for graph creation
edges_for_graph <- filtered_edges %>%
rename(from = id, to = type_new)
# Create a bipartite graph
bipartite_graph <- graph_from_data_frame(d = edges_for_graph, directed = FALSE)
# Assign types for bipartite mapping
V(bipartite_graph)$type <- bipartite_mapping(bipartite_graph)$type
# Prepare nodes for visualization
nodes_vis <- data.frame(
id = V(bipartite_graph)$name,
label = V(bipartite_graph)$name,
group = ifelse(V(bipartite_graph)$type, "Type", "Company")
)
# Prepare edges for visualization
edges_vis <- igraph::as_data_frame(bipartite_graph, what = "edges")
# Create and customize the visNetwork graph
vis1 <- visNetwork(nodes_vis, edges_vis, width = "100%", height = "800px") %>%
visNodes(shape = "dot", scaling = list(label = list(enabled = TRUE))) %>%
visEdges(arrows = "none", color = list(color = "lightgray", highlight = "red")) %>%
visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE) %>%
visLegend() %>%
visGroups(groupname = "Type", shape = "dot", color = "blue") %>%
visGroups(groupname = "Company", shape = "dot", color = "green") %>%
visLayout(randomSeed = 21) %>%
visInteraction(navigationButtons = TRUE) %>%
visPhysics(stabilization = FALSE, enabled = FALSE)# Display the graph
vis1The graph helps in understanding the complex web of interactions in the dataset, highlighting how entities are interconnected through various types of relationships. Each node represents an entity, such as a company or individual, and the edges depict the relationships between them, including shareholdership, beneficial ownership, and employment links. For example,
Mosley and Sons has connection to shareholdership, worksfor and beneficial ownership while Nguyen, Shelton and Hayes only has connection to shareholdership and beneficial ownership.Overlapping Business Types?
Click to show code
top_10_ids <- mc3_edges %>%
group_by(id) %>%
summarize(count = n()) %>%
arrange(desc(count)) %>%
slice(1:10)
filtered_edges <- mc3_edges %>%
filter(id %in% top_10_ids$id)
type_new_counts <- filtered_edges %>%
group_by(id, type_new) %>%
summarise(count = n(), .groups = "drop") %>%
arrange(desc(count))
type_new_colors <- c("Shareholdership" = "pink",
"BeneficialOwnership" = "lightblue",
"WorksFor" = "lightgreen")
wrap_text <- function(text, width) {
wrapped_text <- str_replace_all(text, "-", " ")
wrapped_text <- str_wrap(wrapped_text, width = width)
wrapped_text <- str_replace_all(wrapped_text, " ", "-")
return(wrapped_text)
}
type_new_counts <- type_new_counts %>%
mutate(id_wrapped = wrap_text(id, 10))
type_new_counts <- type_new_counts %>%
mutate(id_wrapped = factor(id_wrapped, levels = type_new_counts %>%
group_by(id_wrapped) %>%
summarize(total_count = sum(count)) %>%
arrange(desc(total_count)) %>%
pull(id_wrapped)))
b4 <- ggplot(type_new_counts, aes(x = id_wrapped, y = count, fill = type_new)) +
geom_bar(stat = "identity", position = position_dodge(width = 0.9), width = 0.7) +
geom_text(aes(label = count), position = position_dodge(width = 0.9), vjust = 0.8, size = 3) + # Add text labels
labs(x = " ", y = " ", title = "Top Names with Multiple Type Relationships") +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 0, hjust = 0.5, size = 8),
legend.position = "bottom"
) +
scale_fill_manual(values = type_new_colors, name = "Type")print(b4)
This bar chart visualises the top names with multiple types of relationships in the network. The x-axis lists the top entities, while the y-axis represents the count of different relationship types, illustrating how some organisations or individuals are central to multiple types of interactions within the network.
Downs Group and Mills, Atkinson and Chavez have the highest number of relationships, indicating their significant roles in various types of network interactions.Which Company has the highest Revenue?
Click to show code
library(plotly)
company_revenue <- mc3_nodes %>%
group_by(id, type_3) %>%
summarise(total_revenue = sum(revenue_omu, na.rm = TRUE), .groups = 'drop')
plot <- plot_ly(data = company_revenue,
x = ~id,
y = ~total_revenue,
text = ~paste("Name:", id,
"<br>Total Revenue:", total_revenue,
"<br>Type:", type_3),
type = "scatter",
mode = "markers",
marker = list(size = ~total_revenue / 1000000,
sizemode = 'area',
sizeref = 0.1,
color = ~total_revenue,
colorscale = "Viridis",
showscale = TRUE))
plot <- plot %>%
layout(title = "Interactive Bubble Chart of Company Revenue",
xaxis = list(title = "", showticklabels = FALSE), # Hide x-axis labels
yaxis = list(title = "Total Revenue"),
hovermode = "closest")plotThis bubble chart visualises the distribution of total revenue for different entities in the network. Each bubble represents an entity, with the size of the bubble corresponding to the total revenue and the colour indicating the magnitude of the revenue. The largest yellow bubble represents
Briggs-Wilson, a company with a total revenue of approximately 310.6 million.Irregular Patterns by Revenue
Extracting year from full date as using the full date format is too detail for analysis.
edges <- edges %>%
mutate(
Start_Year = year(start_date),
End_Year = year(end_date)
)
nodes <- nodes %>%
mutate(added_year = year(date_added))The id with top 8 revenues had been extracted and visualise using heat map.
Click to show code
edges <- edges %>%
mutate(
Start_Year = year(start_date),
End_Year = year(end_date)
)
nodes <- nodes %>%
mutate(added_year = year(date_added))
top_8_revenue_nodes <- nodes %>%
arrange(desc(revenue)) %>%
slice_head(n = 8)
MC2_node_abnor <- edges %>%
filter(id %in% top_8_revenue_nodes$id) %>%
group_by(id, Start_Year) %>%
summarise(weight = n(), .groups = "drop")
MC2_node_abno <- edges %>%
filter(id %in% top_8_revenue_nodes$id) %>%
group_by(id, Start_Year) %>%
summarise(weight = n(), .groups = "drop")
g1 <- ggplot(MC2_node_abnor, aes(Start_Year, id)) +
geom_tile(aes(fill = weight)) +
geom_text(aes(label = weight), size = 3) +
labs(title = "Irregular Pattern by Revenue") +
scale_fill_gradient(low = "white", high = "lightblue") +
theme(axis.text.x = element_text(angle = 0, hjust = 1, size = 8)) +
theme(axis.text.y = element_text(size = 6))print(g1)
This heat map visualises the irregular patterns in revenue for the top 8 revenue companies over time. Each cell represents a specific entity and year, with the numbers indicating the weight or frequency of revenue changes. More details of transaction and relationship examination should be done to check any atypical business activities.
-
Williams, Stanley and Butlet had a unusually high activity count of 9 in 2025.-
Yang PLC had peaks in 2012, 2021 and 2022.-
Short, Hernandez and Myers had irregular activity peaks at 2011 and 2020.-
Roth Ltd had isolated peaks in 2010 and 2025.-
Pearson-Williams had high activity during 2015 to 2017 with a drop off after.-
Dunlap, Fleming and Brown had significant peak in 2025 but low activity overall.-
Cooper, Jacobs and Gonzalez had high activities from 2018 to 2021.-
Briggs-Wilson had peaks in 2010 and 2028 but low activity overall.Irregular Activities of Business
Click to show code
filtered_edges <- edges %>%
filter(!is.na(start_date) & !is.na(end_date))
filtered_edges <- filtered_edges %>%
mutate(year = year(start_date))
activity_counts <- filtered_edges %>%
group_by(year, id, type_new) %>%
summarise(activity_count = n()) %>%
ungroup()
activity_counts <- activity_counts %>%
group_by(type_new) %>%
mutate(
Q1 = quantile(activity_count, 0.25),
Q3 = quantile(activity_count, 0.75),
IQR = Q3 - Q1,
lower_bound = Q1 - 1.5 * IQR,
upper_bound = Q3 + 1.5 * IQR,
is_outlier = activity_count < lower_bound | activity_count > upper_bound
) %>%
ungroup()
o1 <- ggplot(activity_counts, aes(x = year, y = activity_count, color = is_outlier)) +
geom_point() +
geom_text_repel(aes(label = ifelse(is_outlier, id, "")), size = 2.5) +
facet_wrap(~ type_new) +
labs(title = "Activity Counts per Year with Outliers Highlighted",
x = "Year", y = "Activity Count", color = "Outlier") +
theme_minimal()print(o1)
Click to show code
filtered_edges <- edges %>%
filter(!is.na(start_date) & !is.na(end_date))
filtered_edges <- filtered_edges %>%
mutate(year = year(start_date))
activity_counts <- filtered_edges %>%
group_by(year, id, type_new) %>%
summarise(activity_count = n()) %>%
ungroup()
activity_counts <- activity_counts %>%
group_by(type_new) %>%
mutate(
Q1 = quantile(activity_count, 0.25),
Q3 = quantile(activity_count, 0.75),
IQR = Q3 - Q1,
lower_bound = Q1 - 1.5 * IQR,
upper_bound = Q3 + 1.5 * IQR,
is_outlier = activity_count < lower_bound | activity_count > upper_bound
) %>%
ungroup()
ggplot(activity_counts, aes(x = year, y = activity_count, color = is_outlier)) +
geom_point() +
geom_text_repel(aes(label = ifelse(is_outlier, id, "")), size = 2.5) +
facet_wrap(~ type_new, scales = "free_y") + # Using free_y scale for better visualization
labs(title = "Activity Counts per Year with Outliers Highlighted",
x = "Year", y = "Activity Count", color = "Outlier") +
scale_color_manual(values = c("FALSE" = "blue", "TRUE" = "red")) + # Color coding outliers
theme_minimal() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(size = 14, face = "bold")
)
The scatter plot displays activity counts per year, categorised by BeneficialOwnership, Shareholdership, and WorksFor relationships. The y-axis represents the activity count, while the x-axis shows the year. Outliers (TRUE), highlighted in blue, indicate entities with unusually high activity levels and indicates unusual activities.
Stephens-Lopez stands out with multiple outliers around 2019 onwards and indicating a substantial increase in activities.Wright LLC showed increase activities since 2018 onwards and these activities were flagged as outliers indicating a potential unusual ownership transfer or increase in ownership stakes for.Smith-Ramirez showed a sudden rise around 2020Haas-Tran and Blair PLC display outliers around 2030 indicating potential abnormal business activities.outliers_summary <- activity_counts %>%
filter(is_outlier) %>%
select(year, id, type_new, activity_count) %>%
arrange(type_new, year, desc(activity_count))print(outliers_summary)# A tibble: 31 × 4
year id type_new activity_count
<dbl> <chr> <chr> <int>
1 2014 Wright LLC BeneficialOwnership 2
2 2015 Wright LLC BeneficialOwnership 2
3 2017 Nichols-Esparza BeneficialOwnership 2
4 2022 Wright LLC BeneficialOwnership 2
5 2023 Nichols-Esparza BeneficialOwnership 2
6 2027 Smith-Ramirez BeneficialOwnership 4
7 2029 Smith-Ramirez BeneficialOwnership 2
8 2030 Smith-Ramirez BeneficialOwnership 2
9 2031 Smith-Ramirez BeneficialOwnership 5
10 2032 Stephens-Lopez BeneficialOwnership 7
# ℹ 21 more rows
ggplot(outliers_summary, aes(x = year, y = id, fill = activity_count)) +
geom_tile(color = "white") +
scale_fill_gradient(low = "lightblue", high = "red") +
facet_wrap(~ type_new, scales = "free_y") +
labs(title = "Heatmap of Outliers in Activity Counts",
x = "Year", y = "Entity", fill = "Activity Count") +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(size = 14, face = "bold")
)
This heatmap illustrates outliers in activity counts for BeneficialOwnership and Shareholdership relationships. Entities like
Wright LLC and Stephens-Lopez exhibit significant outliers in BeneficialOwnership, while Woods Inc and Weaver-Price show unusual activity in Shareholdership. The activity counts have increased since late 2020, indicating abnormal patterns.Conclusions
The visual analytics approach effectively highlights temporal patterns and changes within companies. However, continuous monitoring and in-depth investigations are necessary to uncover the underlying motivations behind these changes.
Transforming complex data into meaningful visualisations is challenging. Each graph felt like peeling back layers to reveal the intricate web of corporate activities, much like solving a puzzle. This process requires a blend of technical skills and creative thinking, but it ultimately provides valuable insights to better understand the network.